Validation of the Strengths and Difficulties Questionnaire (SDQ) emotional subscale in assessing depression and anxiety across development

Emotional disorders are common in childhood, and their prevalence sharply increases during adolescence. The Strengths and Difficulties Questionnaire (SDQ) is widely used for screening emotional and behavioural difficulties in children and young people, but little is known about the accuracy of the emotional subscale (SDQ-E) in detecting emotional disorders, and whether this changes over development. Such knowledge is important in determining whether symptom changes across age are due to developmental or measurement differences. This study assessed the validity of the SDQ-E and two individual items (low mood and general worry) in differentiating between cases and non-cases of Major Depressive Disorder (MDD), Generalised Anxiety Disorder (GAD), and other anxiety disorders across ages 7, 10, 13, 15, and 25 years in a UK population cohort. Analyses showed moderate accuracy of the subscale in discriminating cases of MDD (AUC = 0.67–0.85), and high accuracy for discriminating cases of GAD (AUC = 0.80–0.93) and any anxiety disorder (AUC = 0.74–0.83) compared to non-cases. The SDQ-E performed well across ages and sex, and generally performed better than the two individual items. Together our findings validate the SDQ-E as a screen for emotional disorders during childhood, adolescence, and early adulthood, and as a tool for longitudinal research on depression and anxiety disorders.


Introduction
Emotional disorders like depression and anxiety are common in childhood, and their prevalence sharply rises in adolescence and early adult life [1]. In order to understand how emotional problems develop in young people, it is crucial that symptoms are assessed repeatedly using comparable measures. Yet a current barrier to longitudinal research on the developmental course of emotional disorders is that different measures of depression and anxiety are typically used for adolescents and adults. This makes it difficult to determine whether any differences are developmental or due to measurement changes. The Strengths and Difficulties Questionnaire (SDQ) is the most widely used screen for emotional and behavioural difficulties in children and young people [2]. It includes five subscales (emotional symptoms, hyperactivity/inattention, conduct and peer relationship problems, prosocial behaviour) that can be either combined into a total difficulties score, or investigated as separate subscales. Separate subscales can also be combined to create internalising and externalising scales.
The SDQ emotional subscale (SDQ-E) is made up of five items relating to depression, worry, fear, nervousness, and somatic symptoms. The measure has previously been validated against diagnostic measures of depression or anxiety disorders (e.g. [3]), but it is unclear whether the SDQ-E (or its constituent items) show different patterns of association with depression and anxiety at different developmental time points.
Much of the research on the validity of the SDQ in identifying depression or anxiety disorders has focused on either the full SDQ scale [4], the emotional subscale in childhood alone [5], or has grouped children and adolescents into one age category [2,3]. While such research has provided initial promising evidence for the SDQ-E in detecting emotional disorders, we know little about the validity of the SDQ-E in detecting Major Depressive Disorder (MDD) and anxiety disorders across different ages. Such knowledge is important for longitudinal research that aims to retain the same measure to assess the developmental course of depression and anxiety. Understanding whether one scale, or a reduced set of items, present similar levels of accuracy across development could also help confirm existing research on changes in incidence, which show that MDD and some anxiety disorders have a marked increase in prevalence across adolescence, particularly for females [1,6]. If measurement scales used to assess these disorders are valid across development, we can be confident that symptom changes are developmental and not due to measurement changes.
The aims of the current study were to assess and compare the validity of the emotional subscale and individual items that reflect core features of depression or anxiety (low mood and general worry respectively) for detecting diagnoses of DSM-5 [7] MDD, Generalised Anxiety Disorder (GAD), as well as other anxiety disorders across development. To do this we use a UK population cohort that includes participants who have been assessed repeatedly using the SDQ, and have completed research diagnostic instruments designed to generate clinical diagnoses of depression and anxiety. The study represents the first of its kind to test (a) the discriminative validity of the emotional subscale in detecting depressive and anxiety disorders across development, and (b) whether individual depressive or worry items from the emotional subscale are able to differentiate those with depressive or anxiety disorders from those without, at levels similar to those for the full SDQ emotional subscale. Doing so could aid future developmental research in selecting appropriate measures across developmental periods. This could prove especially important for improving the assessment and monitoring of these disorders across development and during the transition from child and adolescent to adult health services.

Sample
Data were taken from the Avon Longitudinal Study of Parents and Children (ALSPAC), a prospective, longitudinal birth cohort study based in the UK [8,9]. Pregnant women based in the Avon area of England, with an expected delivery date between 1st April 1991 and 31st December 1992, were eligible for the study [8]. Of these, 20,248 pregnancies were identified as being eligible and the initial number of pregnancies enrolled was 14,541, with 13,988 children alive at 1 year of age. The sample for analyses using variables from age 7 and onwards include 15,447 pregnancies following further follow-up [10]. Of these, 14,901 children were alive at 1 year of age. Study data from 22 years were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol [11]. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac. uk/alspac/researchers/our-data/. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Details of the samples included can be found in S1 and S2 Tables. Where families included multiple births, we only included the oldest sibling, as per previous research on this sample [12].

Measures
The Strengths and Difficulties Questionnaire emotional subscale (SDQ-E) and the Development and Wellbeing Assessment (DAWBA) were administered on five occasions: at approximate ages 7 years (mean ages: 81 and 91 months respectively), 10 years (115 and 128 months), 13 years (157 months and 166 months), 15/16 years (198 and 185 months), and at 25 years.
Strengths and Difficulties Questionnaire (SDQ). The Strengths and Difficulties Questionnaire (SDQ) comprises 25 items that fall under five subsections; emotional symptoms, conduct problems, hyperactivity/inattention, peer relationship problems, and prosocial behaviour [2]. Each section is made up of 5 items rated on a three-point scale ('Not true', 'Somewhat true' or 'Certainly true'). Analyses focused on items from the emotional subscale which include, "Often unhappy, down-hearted or tearful"; "Complains of headache/stomach ache"; "Many worries, often seems worried"; "Nervous or clingy in new situations, easily loses confidence"; and "Many fears, easily scared". In line with the SDQ recommendations (www. sdqinfo.org), items were summed to generate an overall emotional score that ranges from 0 to 10, with mean imputation used for those with (�2) of items missing. Further analyses focused on two core symptoms of depression and generalised anxiety, "Often unhappy, down-hearted or tearful" and "Many worries, often seems worried".
The SDQ was administered and completed by the main carer in ALSPAC at the approximate ages of 7, 10, 13, 16, and 25 years, as well as by self-report at age 25. This allowed validation against the depressive and anxiety diagnoses at similar timepoints.
Diagnoses. Major Depressive Disorder (MDD) and Generalised Anxiety Disorder (GAD), as well as a range of other anxiety disorders (see Table 1 for details), were captured using the Development and Well-Being Assessment (DAWBA; [13]). DAWBA is a structured interview designed to generate psychiatric diagnoses. Versions of DAWBA are available for parents or carers, teachers, and the individual to complete. In ALSPAC, the interview questions were administered online for participants to complete and diagnoses were generated using computer-generated probability bands. All diagnoses were based on algorithms generated according to DSM-IV, except at age 25 as this assessment was conducted after the release of the DSM-5 [7]. The computerised algorithm predicts the likelihood of a clinical rater assigning each child a diagnosis (see www.DAWBA.com for more information). The six-band computer prediction generated from these is then used to create a binary diagnosis variable according to whether the probability of a diagnosis is greater than 50%. This approach has been previously validated in a separate sample [14].
For the present study, we used parent-ratings from the DAWBA, available at 7, 10, and 13 years, as well as self-reported DAWBA from the depression section at 15 and 25 years, and for GAD and other anxiety disorders at 15 years. Questions on MDD refer to the previous four weeks, while questions related to GAD refer to the last 6 months. Items specific to phobias refer to instances in the last month. It is worth noting that the gap between the DAWBA diagnoses and the SDQ assessments varied slightly across the different ages (see Supplementary). We focused on whether individuals met the criteria for MDD, GAD, or any form of anxiety disorder assessed at each time point (see Table 1). In addition, as a secondary analysis to compare the performance of the emotional subscale for emotional and non-emotional disorders, we also used the parent DAWBA at 16 years to identify children without emotional disorders but who experienced other disorders including Attention Deficit Hyperactivity Disorder (ADHD) or any behavioural disorder (conduct disorder, oppositional defiant disorder). A full list of the DAWBA diagnoses included can be found in Table 1. It should be noted that there are minor differences in the anxiety disorders included at age 15 according to informant (see Table 1). Specifically, self-reports of 'any anxiety disorder' do not include separation anxiety disorder (which is included in the parent-rated 'any anxiety disorder' variable but they include the addition of panic disorder and agoraphobia). As the DSM-5 no longer places Post-Traumatic Stress Disorder (PTSD) or Obsessive Compulsive Disorder (OCD) as anxiety disorders, these were removed in the generation of our 'any anxiety' disorder category at every age.

Analyses
All analyses were conducted in Stata version 17 and used Receiver Operating Characteristic (ROC) curve analyses. These enabled tests of the ability of the SDQ emotional subscale to distinguish between cases and non-cases of MDD, GAD, any form of anxiety disorder, and other disorder (ADHD or behavioural disorder not emotional disorder). A ROC curve was plotted (sensitivity vs 1-specificity), and the area under the curve (AUC) estimated. The area under the curve (AUC) was used to determine the diagnostic efficiency of the emotional subscale score in identifying individuals meeting the depressive or anxiety diagnostic criteria. An AUC value of 0.5 indicates performance at chance level, therefore values <0.7 are considered as having low test accuracy. Estimates between 0.7 and 0.9 are considered moderate, and those >0.9 as excellent when screening questionnaire performance against gold standard diagnostic measures [15]. It is worth noting that in some instances, such as prognostic prediction, different AUC values are sometimes considered relevant. Sensitivity (the ability of the screening questionnaire to correctly identify those with a diagnosis) and specificity (the ability of the screening questionnaire to correctly identify those without a diagnosis) estimates were derived from the ROC curve analyses and used to explore optimum cut-off scores for the full emotional subscale (range 0 to 10). Optimum cut-points for diagnosis were selected as those that best balanced sensitivity and specificity according to maximal Youden Index (sensitivity + specificity-1) [16]. These tests of sensitivity and specificity form part of the Standards for Reporting Diagnostic Accuracy (STARD) initiative [17,18].
Positive predictive values (PPV) and negative predictive values (NPV) were calculated for the optimal cut-point to show the probability that those exceeding the cut-point have a diagnosis, and the probability that those not meeting the cut-point do not have a diagnosis, respectively. These predictive values are dependent on sample prevalence rates. Following this, analyses investigated two individual items from the emotional subscale to determine whether a) the low mood item was better at discriminating MDD compared to the worry item, and b) whether the worry item was better at discriminating GAD compared to the low mood item. This was done using Stata's roccomp function [19].

Sensitivity analyses
Sensitivity analyses explored the ability of the emotional subscale score, and two individual items in detecting cases of MDD, GAD, and any anxiety disorder among males and females separately. These analyses were also run using Stata's roccomp function which allows comparison of AUC values by sex.
Missing data. To examine whether any differences in findings across time points could be reflective of different patterns of missing data across ages rather than developmental differences, supplementary analyses used multiple imputation. Multiple imputation by chained equations (MICE) was used to impute missing diagnostic and SDQ data at each time point for individuals who had data at least once on the SDQ, and who completed at least one DAWBA assessment (n = 9,241). One hundred imputed datasets were produced, and estimates were pooled across these 100 datasets. All analysis variables were included in the imputation models alongside auxiliary variables previously shown to predict missingness in the ALSPAC cohort (see S3 Table for full list). Imputation was conducted using Predictive Mean Matching (PMM) [20]. This matches the predictive mean distance of incomplete observations with the complete observations. Datasets were imputed using the 'mi estimate' and 'eroctab' commands in Stata which fit a model to each of the imputed data sets and pool individual results using Rubin's combination rules [21].

Descriptives
The internal consistency of SDQ-E scores averaged 0.69, which is slightly above previous findings of 0.67 [2], and there was some evidence of improvement with age (α = 0.63 at 7 years, α = 0.68 at 10 years, α = 0.67 at 13 years, α = 0.71 at 16 years, α = 0.77 at 25 years). Total scores on the emotional subscale remained relatively stable across childhood and adolescence, but were somewhat higher in young adulthood (particularly for self-reports) (S1 Table). Females scored more highly than males at all time points (see S2 Table), as anticipated based on prevalence rates of emotional disorders between sexes [1]).
For the diagnoses in our study, the prevalence of both depressive and anxiety disorders varied across development, with both MDD and GAD most common at the oldest age assessed (25 years for MDD and 15 years for GAD), and least common at 7 years (see S1 Table). Varying rates of diagnosis were noted across males and females according to age and disorder (see Table 2). Note that while not tested specifically, there was a general trend in which males had higher rates of depression and anxiety than females in childhood, but the prevalence was greater for females compared to males in adolescence.

Validation of the emotional subscale
The Receiver Operating Characteristic (ROC) curves are shown in Fig 1 for MDD and Fig 2 for GAD (see S1 and S2 Figs for any anxiety and ADHD/any behavioural disorder outcomes). These analyses showed that the emotional subscale had moderate accuracy in discriminating between cases and non-cases of MDD (AUC range = 0.67-0.85), and high accuracy in discriminating cases of GAD (AUC range = 0.80-0.93) and any anxiety disorder (0.74-0.83) from non-cases (see Table 3). Accuracy tended to be higher for GAD and any anxiety disorder across all time points compared to MDD. No clear pattern was observed across development for any of the disorder outcomes and there were overlapping confidence intervals across all time points for each disorder, suggesting no differences by age. In particular, comparing parent reports (available across age) there was no evidence for any consistent increase or decrease in accuracy with age. As expected, and providing evidence of discriminant validity, the SDQ-E performed less well at identifying other disorders (ADHD or behavioural) in the absence of an emotional disorder (AUC range = 0.61-0.70). Results from the imputed dataset revealed largely consistent findings (S4 Table).
Sensitivity and specificity values for all possible SDQ emotional subscale cut-points are reported in S5 -S7 Tables. Optimal cut points, which balanced sensitivity and specificity, were  generally lower for MDD diagnoses in childhood and adolescence compared to those for GAD. For MDD, a score of 3 or higher captured between 52% and 81% of MDD diagnoses across development, while a score of 4 or 5 for GAD captured between 77% and 88% of cases across development (see S8 Table). When investigating the two individual items (low mood and general worry), these did not perform as well as the full emotional subscale in detecting depressive or anxiety diagnoses respectively ( Table 3). The low mood item demonstrated greater accuracy than the general worry item in detecting MDD at 25 years when self-reported. The low mood and worry items showed similar accuracy in detecting cases of GAD, but the worry item performed better for detecting cases of any anxiety disorder at 7 and 13 years (see Table 3).

PLOS ONE
Validation of the SDQ emotional subscale in assessing depression and anxiety across development

Sensitivity analyses
We did not find consistent sex differences in the accuracy of the SDQ-E in identifying cases of MDD, GAD or other anxiety disorders (see Table 4 and also S9 and S10 Tables).

Discussion
This study aimed to examine the validity of the emotional subscale of the SDQ in identifying depression and anxiety diagnoses across development. Our findings provide validation that the subscale can be used to distinguish between those meeting the diagnostic criteria for MDD, GAD, or any other anxiety disorder from those who do not. This was the case for disorders across all ages, and for both males and females. Such findings provide important validation for a time-efficient tool that can be used by researchers studying depressive or anxiety disorders across development.
The accuracy of the emotional subscale for distinguishing between cases and non-cases was high for GAD and anxiety disorders, and moderate for MDD, in line with previous research on children [5], 2015). Our findings extend these to provide the first evidence that validity is largely stable across time for both depression and anxiety, with AUC values ranging from 0.72 to 0.85 for MDD, from 0.80 to 0.93 across development for GAD, and from 0.74 to 0.83 for any anxiety disorder. Accuracy was also stable for both sexes, suggesting that while the presentation of symptoms may vary with age and sex, the items within the emotional subscale capture a similar construct that can be examined fairly across development and for males and females.
The AUC values in the present study balanced specificity and sensitivity. This is the typical approach used in research whereby an individual is categorised as having a disorder in one step. False positives and false negatives in this instance are therefore equally undesirable. In a clinical setting, a two-step process is often used such that a questionnaire or interview is followed by further assessments. These initial assessments may be more accepting of false positives and favour a cut-point that prioritises sensitivity over specificity. During the second diagnostic step, clinicians may then instead favour specificity over sensitivity [5]. Variable cut-  points may therefore be required depending on the rationale for using the instrument, as outlined in the STARD statement [17]. Our findings provide insight into specific cut-points recommended for the SDQ-E to detect depression and anxiety disorders across development. Current recommendations for the SDQ-E suggest that in childhood and adolescence, parent-rated emotional scores of 5 or 6 capture the top 10% of 'abnormal' scores in the population, while a score of 4 captures those with slightly raised or 'borderline' symptoms. Our results suggest a lower cut-point may be needed to capture clinical symptoms of MDD in childhood and adolescence when using parent-ratings. In particular, findings revealed that a score of 3 or higher captured between 52% and 81% of MDD diagnoses across development, compared to a score of 4 or 5 for GAD, which captured between 77% and 88% of cases across development. When using self-reports to capture MDD at 25 years, a score of 5 was the optimal cut-off point (see S10 Table for more detail). Note that the optimal cut-point will also depend on the setting and whether it is more desirable to avoid false positives or false negatives.

Strengths and limitations
By using a longitudinal design, our study overcomes previous limitations related to the generalisability of results from childhood to adolescence and beyond. This is crucial for determining whether the same construct can be used in research focused on depressive and anxiety disorders across development. Several limitations of this study, however, should be noted.
First, there was a gap of 9-13 months between the SDQ and DAWBA assessments at each age. Estimates of the validity of the SDQ-E are therefore likely conservative. This may present a particular issue for MDD which is often episodic, and may explain why the SDQ-E appeared to perform more poorly for MDD compared to anxiety. Indeed, at 25 years where the age gap between the completion of the SDQ and DAWBA was minimal, the accuracy for depression was higher. It is also possible that accuracy was lower during late adolescence because of differences in informant. At 15/16 years, parent and self-report were available for the SDQ-E and DAWBA diagnoses respectively. All analyses prior to this relied on parent-reports for both the SDQ-E and DAWBA assessment. Future validation studies should attempt to replicate our findings using both parent and adolescent self-reports of diagnoses and the SDQ-E. This multi-informant approach is commonly used in practice, and it would help to ensure findings do not underreport the frequency or severity of emotional problems, and would minimise bias that could arise from using different informants for different assessments. Second, future research should also seek to use clinical diagnoses. As per most epidemiological studies on depression and anxiety, diagnoses in the current study were not generated using clinical interviews with trained interviewers, meaning information available to derive diagnoses was more limited.
Other considerations relate to the prevalence and comorbidity of emotional disorders. The overall prevalence of MDD, GAD, and any anxiety diagnosis was relatively low, likely due to the young age and stringent DAWBA assessments. Co-morbidity of depressive and anxiety disorders were also common, and there were varying rates noted for females and males (S11 Table). This aligns with previous research which has shown that compared to adolescents who are not depressed, those with a depression diagnosis are six to 12 times more likely to also have anxiety [22]. Rates of comorbidity may have altered the accuracy of the subscale in detecting cases of anxiety or depression, particularly as more individuals had GAD with MDD compared to having MDD with anxiety. Further sensitivity analyses removing comorbid cases would have been underpowered, but should be considered in larger validation studies.
Finally, ALSPAC like other prospective birth cohorts experienced non-trivial participant drop-out over time. Previous work has shown that children with increased risk for mental health problems and more disadvantaged families were less likely to participate in the study in childhood and adolescence [8,23]. However, sensitivity analyses accounting for attrition across all follow-up time points through multiple imputation showed closely similar findings.

Conclusion
Overall, our findings suggest the emotional subscale of the SDQ is an appropriate measure to determine the risk of emotional disorders like depression and anxiety among males and females across childhood and adolescence, and in adulthood for depression. This provides important validation for the use of this subscale in monitoring longitudinal changes in emotional problems across development within research. Retaining the same measure to study the developmental course of depressive and anxiety disorders is especially important when studying changes in incidence and outcomes as it helps to ensure differences are less likely a result of measurement changes. Our findings could therefore aid the assessment and monitoring of those at risk of depressive or anxiety disorders to both reduce their prevalence and associated long-term effects.